Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform a less invasive buffer content replacement operation #24

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jimeh
Copy link

@jimeh jimeh commented May 23, 2020

Borrowed from go-mode's gofmt and releated functions:
https://github.com/dominikh/go-mode.el/blob/734d5232455ffde088021ea5908849ac570e890f/go-mode.el#L1850-L1956

Hence I'm not the original author of most this code, but I have used a
number of borrowed variations of it over the years for hand-rolled
formatting solutions.

It works by passing the current buffer content to diff via STDIN to
compare it against the target file on disk producing a RCS formatted
diff that is easy to parse. It then iterates through the said diff only
modifying relevant lines to make the buffer be identical to the target
file.

In my experience it does a much better job of maintaining cursor
position compared to insert-file-contents, as it only modifies lines
that got corrected, and leaves all other lines in the buffer untouched.

Performance is also excellent and near instant. On my late-2016 13-inch
MacBook Pro with a Dual-Core i7 at 3.3Ghz, I can't really tell the
difference between this solution and insert-file-contents.

I did also test it against the Haskell example from:
#19

Replacing buffer content of Req.hs with that of Req-reformatted.hs
is most of the time just as fast as insert-file-contents (around
50ms), and only sometimes a little slower of around 100-150ms based on
my totally unscientific measurement technique of guesstimating it. I
find that rather impressive as the RCS-formatted diff itself is 17KB.

I also tested the old replace-buffer-contents-based variant to see how
it performs on my machine. It took around 15 seconds.

Borrowed from go-mode's gofmt and releated functions:
https://github.com/dominikh/go-mode.el/blob/734d5232455ffde088021ea5908849ac570e890f/go-mode.el#L1850-L1956

Hence I'm not the original author of most this code, but I have used a
number of borrowed variations of it over the years for hand-rolled
formatting solutions.

It works by passing the current buffer content to `diff` via STDIN to
compare it against the target file on disk producing a RCS formatted
diff that is easy to parse. It then iterates through the said diff only
modifying relevant lines to make the buffer be identical to the target
file.

In my experience it does a much better job of maintaining cursor
position compared to `insert-file-contents`, as it only modifies lines
that got corrected, and leaves all other lines in the buffer untouched.

Performance is also excellent and near instant. On my late-2016 13-inch
MacBook Pro with a Dual-Core i7 at 3.3Ghz, I can't really tell the
difference between this solution and `insert-file-contents`.

I did also test it against the Haskell example from:
purcell#19

Replacing buffer content of `Req.hs` with that of `Req-reformatted.hs`
is most of the time just as fast as `insert-file-contents` (around
50ms), and only sometimes a little slower of around 100-150ms based on
my totally unscientific measurement technique of guesstimating it. I
find that rather impressive as the RCS-formatted diff itself is 17KB.

I also tested the old `replace-buffer-contents`-based variant to see how
it performs on my machine. It took around 15 seconds.
@purcell
Copy link
Owner

purcell commented May 26, 2020

Thanks! Yes, I've seen various incarnations of this RCS diff code in my elisp travels.

Any idea why this is so much faster than replace-buffer-contents? There must be a catch, presumably related to its ability to retain buffer context.

I'm not averse to applying something like this, but I'm also not super keen to vouch for the code's correctness or maintain it in perpetuity. 😃

@wyuenho
Copy link
Contributor

wyuenho commented May 26, 2020

Any idea why this is so much faster than replace-buffer-contents?

Maybe because the major mode doesn't have to reapply font lock to the entire file, imenu doesn't have to reindex and a whole bunch of other file-wide operations can be skipped?

@purcell
Copy link
Owner

purcell commented May 26, 2020

@wyuenho I suspect from this comment that you are unfamiliar with replace-buffer-contents, which is a new function that explicitly aims to minimise such unnecessary modifications in the buffer while replacing its contents.

@jimeh
Copy link
Author

jimeh commented May 26, 2020

@purcell I'm afraid I don't why replace-buffer-contents is so slow, I did have a quick look at the source code, but I know almost no C. If I had to guess though, maybe replace-buffer-contents tries to do as small modifications as possible, leading to tens of thousands of operations for Req.hs. While the RCS-diff approach ends up just operating on whole lines, yielding a lot fewer operations.

I obviously didn't write it, but I've used a variant of this in my own rubocopfmt.el for over three years now without any real issues, except the cursor being moved to the beginning of the current line if the current line is corrected. However it looks like that was fixed in go-mode two years ago, and is hence included in this PR.

Obviously a "works for me" isn't exactly the greatest vote of confidence ever, but it's better than nothing, and hopefully counts for something :)

As for maintaining it moving forward, I'd say it's probably fine to just re-borrow it from go-mode if anything changes over there. That's ignoring or assuming there's no licensing issues though.

Stability wise it looks pretty good too. Doing a git blame, the relevant functions have remained mostly unchanged for 7 years, with a minor fix 2 years ago:

@purcell
Copy link
Owner

purcell commented Aug 14, 2020

Hey, so I've been spending a bit of time doing some design/planning for reformatter.el, and I'm revisiting this PR in that context. Here's what I'm thinking:

  • First we add an :output-format flag to reformatter-define, with possible values raw and rcsdiff. In the case that it is the latter, we assume the formatter's output was an RCS diff, and we use the functions you've provided to apply the changes to the buffer. This allows us to support cases like Support formatters that output patch #20, for shellcheck (cc @lafrenierejm).
  • Next (optionally) we provide a global reformatter-replacement-function custom var which would be a function that updates the current buffer's contents from a file. We could then provide a diff-based function that users could optionally use. We could even make this the default when diff is found to be present on the system, which would allow for wider testing.

I feel like this route would give us the definitely-useful additional functionality needed for programs that output diffs, via the first step, while making it easy to add and test the second part. What do you think?

@jimeh
Copy link
Author

jimeh commented Aug 14, 2020

@purcell Both of those changes sound good to me. Let me know if there's anything I can help with :)

@lassik
Copy link

lassik commented Feb 20, 2021

We need the same thing in lassik/emacs-format-all-the-code#124. What to do? It doesn't make sense for lots of Emacs packages to all carry their own RCS-patching and buffer-replacing code.

format-all and reformatter could possibly be refactored so that format-all definitions are written using reformatter, but that's a much bigger project. I think format-all supports some formatter quirks that are not yet in reformatter.

@purcell
Copy link
Owner

purcell commented Feb 22, 2021

I'm still keen to move forward with this, but I don't have a personal use case and I've been very busy recently, so my time for open source has been extremely limited.

@lassik It's possible that making reformatter--do-region more of a public end point with stable and well-documented options could fit the bill for implementing format-all on top of reformatter.

@lassik
Copy link

lassik commented Feb 24, 2021

See also https://github.com/raxod502/apheleia

@purcell
Copy link
Owner

purcell commented Feb 24, 2021

I feel like a good plan might be to extract the rcs patch code from apheleia into a separate package we could all use. Perhaps @raxod502 would be up for splitting it out, to avoid us needing to do it unilaterally?

@lassik
Copy link

lassik commented Feb 25, 2021

From a conceptual standpoint, shouldn't reformatter do what apheleia does, and then format-all be written on top of the new reformatter? To be sure, the practicalities take a bit of work.

@lassik
Copy link

lassik commented Feb 25, 2021

live-preview has some code to replace a buffer's contents that has nothing to do with code formatters. So we should have a generic replace-buffer package.

On top of that, it would be great if apheleia and reformatter could be merged, and format-all could be based on that.

@raxod502
Copy link

No objection to merging these various formatting packages, or to extracting common code into a library. I do not think I have the bandwidth to do so myself, but will certainly advise any such effort and will be happy to merge pull requests to Apheleia that may be necessary to support the work.

@raxod502
Copy link

No objection to merging these various formatting packages

See also lassik/emacs-format-all-the-code#170, by the way. (I have more time these days, and am revisiting the idea.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants